Dataset statistics
| Number of variables | 13 |
|---|---|
| Number of observations | 6321 |
| Missing cells | 0 |
| Missing cells (%) | 0.0% |
| Duplicate rows | 0 |
| Duplicate rows (%) | 0.0% |
| Total size in memory | 642.1 KiB |
| Average record size in memory | 104.0 B |
Variable types
| Numeric | 9 |
|---|---|
| Categorical | 4 |
Bidder_ID has a high cardinality: 1054 distinct values | High cardinality |
Bidding_Ratio is highly correlated with Successive_Outbidding and 2 other fields | High correlation |
Last_Bidding is highly correlated with Early_Bidding | High correlation |
Auction_Bids is highly correlated with Starting_Price_Average | High correlation |
Starting_Price_Average is highly correlated with Auction_Bids | High correlation |
Early_Bidding is highly correlated with Last_Bidding | High correlation |
Winning_Ratio is highly correlated with Bidding_Ratio and 1 other fields | High correlation |
Successive_Outbidding is highly correlated with Bidding_Ratio and 1 other fields | High correlation |
Class is highly correlated with Bidding_Ratio and 2 other fields | High correlation |
Record_ID has unique values | Unique |
Bidder_Tendency has 153 (2.4%) zeros | Zeros |
Auction_Bids has 2823 (44.7%) zeros | Zeros |
Starting_Price_Average has 3253 (51.5%) zeros | Zeros |
Winning_Ratio has 3640 (57.6%) zeros | Zeros |
Reproduction
| Analysis started | 2022-12-28 22:02:05.868048 |
|---|---|
| Analysis finished | 2022-12-28 22:02:19.290624 |
| Duration | 13.42 seconds |
| Software version | pandas-profiling v3.4.0 |
| Download configuration | config.json |
| Distinct | 6321 |
|---|---|
| Distinct (%) | 100.0% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 7535.829457 |
| Minimum | 1 |
|---|---|
| Maximum | 15144 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 49.5 KiB |
Quantile statistics
| Minimum | 1 |
|---|---|
| 5-th percentile | 741 |
| Q1 | 3778 |
| median | 7591 |
| Q3 | 11277 |
| 95-th percentile | 14376 |
| Maximum | 15144 |
| Range | 15143 |
| Interquartile range (IQR) | 7499 |
Descriptive statistics
| Standard deviation | 4364.759137 |
|---|---|
| Coefficient of variation (CV) | 0.5792008911 |
| Kurtosis | -1.194715162 |
| Mean | 7535.829457 |
| Median Absolute Deviation (MAD) | 3744 |
| Skewness | 0.005441796984 |
| Sum | 47633978 |
| Variance | 19051122.33 |
| Monotonicity | Strictly increasing |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 1 | 1 | < 0.1% |
| 10015 | 1 | < 0.1% |
| 10013 | 1 | < 0.1% |
| 10011 | 1 | < 0.1% |
| 10009 | 1 | < 0.1% |
| 10005 | 1 | < 0.1% |
| 10004 | 1 | < 0.1% |
| 10003 | 1 | < 0.1% |
| 10002 | 1 | < 0.1% |
| 10001 | 1 | < 0.1% |
| Other values (6311) | 6311 |
| Value | Count | Frequency (%) |
| 1 | 1 | |
| 2 | 1 | |
| 3 | 1 | |
| 4 | 1 | |
| 5 | 1 | |
| 8 | 1 | |
| 10 | 1 | |
| 12 | 1 | |
| 13 | 1 | |
| 27 | 1 |
| Value | Count | Frequency (%) |
| 15144 | 1 | |
| 15139 | 1 | |
| 15138 | 1 | |
| 15137 | 1 | |
| 15129 | 1 | |
| 15128 | 1 | |
| 15124 | 1 | |
| 15123 | 1 | |
| 15121 | 1 | |
| 15115 | 1 |
Auction_ID
Real number (ℝ≥0)
| Distinct | 807 |
|---|---|
| Distinct (%) | 12.8% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 1241.38823 |
| Minimum | 5 |
|---|---|
| Maximum | 2538 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 49.5 KiB |
Quantile statistics
| Minimum | 5 |
|---|---|
| 5-th percentile | 121 |
| Q1 | 589 |
| median | 1246 |
| Q3 | 1867 |
| 95-th percentile | 2434 |
| Maximum | 2538 |
| Range | 2533 |
| Interquartile range (IQR) | 1278 |
Descriptive statistics
| Standard deviation | 735.770789 |
|---|---|
| Coefficient of variation (CV) | 0.5926999881 |
| Kurtosis | -1.203514078 |
| Mean | 1241.38823 |
| Median Absolute Deviation (MAD) | 646 |
| Skewness | 0.04581737075 |
| Sum | 7846815 |
| Variance | 541358.654 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 589 | 26 | 0.4% |
| 1872 | 26 | 0.4% |
| 256 | 24 | 0.4% |
| 658 | 24 | 0.4% |
| 2498 | 23 | 0.4% |
| 32 | 23 | 0.4% |
| 1623 | 23 | 0.4% |
| 1863 | 22 | 0.3% |
| 1119 | 22 | 0.3% |
| 860 | 22 | 0.3% |
| Other values (797) | 6086 |
| Value | Count | Frequency (%) |
| 5 | 9 | 0.1% |
| 6 | 9 | 0.1% |
| 8 | 3 | < 0.1% |
| 9 | 9 | 0.1% |
| 17 | 11 | |
| 23 | 6 | 0.1% |
| 25 | 8 | 0.1% |
| 30 | 5 | 0.1% |
| 31 | 8 | 0.1% |
| 32 | 23 |
| Value | Count | Frequency (%) |
| 2538 | 3 | < 0.1% |
| 2536 | 7 | |
| 2534 | 6 | |
| 2531 | 14 | |
| 2530 | 4 | 0.1% |
| 2529 | 3 | < 0.1% |
| 2527 | 10 | |
| 2516 | 11 | |
| 2512 | 11 | |
| 2508 | 4 | 0.1% |
| Distinct | 1054 |
|---|---|
| Distinct (%) | 16.7% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 49.5 KiB |
| a***a | 112 |
|---|---|
| n***t | 85 |
| e***e | 67 |
| i***a | 50 |
| r***r | 49 |
| Other values (1049) |
Length
| Max length | 5 |
|---|---|
| Median length | 5 |
| Mean length | 5 |
| Min length | 5 |
Characters and Unicode
| Total characters | 31605 |
|---|---|
| Distinct characters | 41 |
| Distinct categories | 6 ? |
| Distinct scripts | 2 ? |
| Distinct blocks | 1 ? |
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.
Unique
| Unique | 276 ? |
|---|---|
| Unique (%) | 4.4% |
Sample
| 1st row | _***i |
|---|---|
| 2nd row | g***r |
| 3rd row | t***p |
| 4th row | 7***n |
| 5th row | z***z |
Common Values
| Value | Count | Frequency (%) |
| a***a | 112 | 1.8% |
| n***t | 85 | 1.3% |
| e***e | 67 | 1.1% |
| i***a | 50 | 0.8% |
| r***r | 49 | 0.8% |
| l***l | 48 | 0.8% |
| o***o | 45 | 0.7% |
| i***i | 44 | 0.7% |
| r***h | 43 | 0.7% |
| r***a | 41 | 0.6% |
| Other values (1044) | 5737 |
Length
Histogram of lengths of the category
| Value | Count | Frequency (%) |
| a***a | 112 | 1.8% |
| n***t | 85 | 1.3% |
| e***e | 67 | 1.1% |
| i***a | 50 | 0.8% |
| r***r | 49 | 0.8% |
| l***l | 48 | 0.8% |
| o***o | 45 | 0.7% |
| i***i | 44 | 0.7% |
| a | 43 | 0.7% |
| r***h | 43 | 0.7% |
| Other values (955) | 5735 |
Most occurring characters
| Value | Count | Frequency (%) |
| * | 18972 | |
| a | 1288 | 4.1% |
| e | 810 | 2.6% |
| i | 746 | 2.4% |
| r | 739 | 2.3% |
| n | 675 | 2.1% |
| o | 644 | 2.0% |
| l | 582 | 1.8% |
| s | 575 | 1.8% |
| t | 537 | 1.7% |
| Other values (31) | 6037 | 19.1% |
Most occurring categories
| Value | Count | Frequency (%) |
| Other Punctuation | 19027 | |
| Lowercase Letter | 10318 | |
| Decimal Number | 1891 | 6.0% |
| Connector Punctuation | 196 | 0.6% |
| Dash Punctuation | 172 | 0.5% |
| Currency Symbol | 1 | < 0.1% |
Most frequent character per category
Lowercase Letter
| Value | Count | Frequency (%) |
| a | 1288 | 12.5% |
| e | 810 | 7.9% |
| i | 746 | 7.2% |
| r | 739 | 7.2% |
| n | 675 | 6.5% |
| o | 644 | 6.2% |
| l | 582 | 5.6% |
| s | 575 | 5.6% |
| t | 537 | 5.2% |
| m | 431 | 4.2% |
| Other values (16) | 3291 |
Decimal Number
| Value | Count | Frequency (%) |
| 0 | 442 | |
| 1 | 306 | |
| 2 | 222 | |
| 8 | 159 | 8.4% |
| 9 | 152 | 8.0% |
| 7 | 149 | 7.9% |
| 4 | 129 | 6.8% |
| 6 | 117 | 6.2% |
| 3 | 114 | 6.0% |
| 5 | 101 | 5.3% |
Other Punctuation
| Value | Count | Frequency (%) |
| * | 18972 | |
| . | 55 | 0.3% |
Connector Punctuation
| Value | Count | Frequency (%) |
| _ | 196 |
Dash Punctuation
| Value | Count | Frequency (%) |
| - | 172 |
Currency Symbol
| Value | Count | Frequency (%) |
| $ | 1 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Common | 21287 | |
| Latin | 10318 |
Most frequent character per script
Latin
| Value | Count | Frequency (%) |
| a | 1288 | 12.5% |
| e | 810 | 7.9% |
| i | 746 | 7.2% |
| r | 739 | 7.2% |
| n | 675 | 6.5% |
| o | 644 | 6.2% |
| l | 582 | 5.6% |
| s | 575 | 5.6% |
| t | 537 | 5.2% |
| m | 431 | 4.2% |
| Other values (16) | 3291 |
Common
| Value | Count | Frequency (%) |
| * | 18972 | |
| 0 | 442 | 2.1% |
| 1 | 306 | 1.4% |
| 2 | 222 | 1.0% |
| _ | 196 | 0.9% |
| - | 172 | 0.8% |
| 8 | 159 | 0.7% |
| 9 | 152 | 0.7% |
| 7 | 149 | 0.7% |
| 4 | 129 | 0.6% |
| Other values (5) | 388 | 1.8% |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 31605 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| * | 18972 | |
| a | 1288 | 4.1% |
| e | 810 | 2.6% |
| i | 746 | 2.4% |
| r | 739 | 2.3% |
| n | 675 | 2.1% |
| o | 644 | 2.0% |
| l | 582 | 1.8% |
| s | 575 | 1.8% |
| t | 537 | 1.7% |
| Other values (31) | 6037 | 19.1% |
| Distinct | 489 |
|---|---|
| Distinct (%) | 7.7% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 0.1425407372 |
| Minimum | 0 |
|---|---|
| Maximum | 1 |
| Zeros | 153 |
| Zeros (%) | 2.4% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 49.5 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0.008196721 |
| Q1 | 0.027027027 |
| median | 0.0625 |
| Q3 | 0.166666667 |
| 95-th percentile | 0.545454545 |
| Maximum | 1 |
| Range | 1 |
| Interquartile range (IQR) | 0.13963964 |
Descriptive statistics
| Standard deviation | 0.1970835553 |
|---|---|
| Coefficient of variation (CV) | 1.382647229 |
| Kurtosis | 6.833835322 |
| Mean | 0.1425407372 |
| Median Absolute Deviation (MAD) | 0.046106557 |
| Skewness | 2.531213439 |
| Sum | 901.0000001 |
| Variance | 0.03884192777 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 0.25 | 190 | 3.0% |
| 0.2 | 186 | 2.9% |
| 0.5 | 162 | 2.6% |
| 0 | 153 | 2.4% |
| 0.333333333 | 142 | 2.2% |
| 0.1 | 131 | 2.1% |
| 0.111111111 | 124 | 2.0% |
| 1 | 123 | 1.9% |
| 0.090909091 | 120 | 1.9% |
| 0.142857143 | 116 | 1.8% |
| Other values (479) | 4874 |
| Value | Count | Frequency (%) |
| 0 | 153 | |
| 0.003021148 | 53 | 0.8% |
| 0.006042296 | 14 | 0.2% |
| 0.006802721 | 19 | 0.3% |
| 0.007246377 | 45 | 0.7% |
| 0.008196721 | 46 | 0.7% |
| 0.008264463 | 18 | 0.3% |
| 0.008403361 | 14 | 0.2% |
| 0.008695652 | 41 | 0.6% |
| 0.009063444 | 9 | 0.1% |
| Value | Count | Frequency (%) |
| 1 | 123 | |
| 0.954545455 | 1 | < 0.1% |
| 0.95 | 1 | < 0.1% |
| 0.944444444 | 1 | < 0.1% |
| 0.928571429 | 1 | < 0.1% |
| 0.916666667 | 2 | < 0.1% |
| 0.913043478 | 1 | < 0.1% |
| 0.909090909 | 1 | < 0.1% |
| 0.9 | 2 | < 0.1% |
| 0.896551724 | 1 | < 0.1% |
| Distinct | 400 |
|---|---|
| Distinct (%) | 6.3% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 0.1276696725 |
| Minimum | 0.011764706 |
|---|---|
| Maximum | 1 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 49.5 KiB |
Quantile statistics
| Minimum | 0.011764706 |
|---|---|
| 5-th percentile | 0.020833333 |
| Q1 | 0.043478261 |
| median | 0.083333333 |
| Q3 | 0.166666667 |
| 95-th percentile | 0.4 |
| Maximum | 1 |
| Range | 0.988235294 |
| Interquartile range (IQR) | 0.123188406 |
Descriptive statistics
| Standard deviation | 0.1315304218 |
|---|---|
| Coefficient of variation (CV) | 1.030240144 |
| Kurtosis | 7.350693708 |
| Mean | 0.1276696725 |
| Median Absolute Deviation (MAD) | 0.046296296 |
| Skewness | 2.421639032 |
| Sum | 807.0000002 |
| Variance | 0.01730025187 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 0.166666667 | 233 | 3.7% |
| 0.2 | 211 | 3.3% |
| 0.083333333 | 189 | 3.0% |
| 0.125 | 180 | 2.8% |
| 0.111111111 | 175 | 2.8% |
| 0.090909091 | 168 | 2.7% |
| 0.142857143 | 159 | 2.5% |
| 0.071428571 | 152 | 2.4% |
| 0.1 | 150 | 2.4% |
| 0.058823529 | 144 | 2.3% |
| Other values (390) | 4560 |
| Value | Count | Frequency (%) |
| 0.011764706 | 29 | |
| 0.012658228 | 6 | 0.1% |
| 0.013157895 | 7 | 0.1% |
| 0.015384615 | 12 | |
| 0.015873016 | 18 | |
| 0.016129032 | 20 | |
| 0.016393443 | 6 | 0.1% |
| 0.016949153 | 11 | 0.2% |
| 0.017241379 | 7 | 0.1% |
| 0.01754386 | 24 |
| Value | Count | Frequency (%) |
| 1 | 5 | |
| 0.961538462 | 1 | < 0.1% |
| 0.886363636 | 1 | < 0.1% |
| 0.875 | 2 | < 0.1% |
| 0.857142857 | 3 | |
| 0.833333333 | 3 | |
| 0.825 | 1 | < 0.1% |
| 0.818181818 | 1 | < 0.1% |
| 0.80952381 | 1 | < 0.1% |
| 0.8 | 6 |
| Distinct | 3 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 49.5 KiB |
| 0.0 | |
|---|---|
| 1.0 | 469 |
| 0.5 | 374 |
Length
| Max length | 3 |
|---|---|
| Median length | 3 |
| Mean length | 3 |
| Min length | 3 |
Characters and Unicode
| Total characters | 18963 |
|---|---|
| Distinct characters | 4 |
| Distinct categories | 2 ? |
| Distinct scripts | 1 ? |
| Distinct blocks | 1 ? |
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | 0.0 |
|---|---|
| 2nd row | 0.0 |
| 3rd row | 0.0 |
| 4th row | 0.0 |
| 5th row | 0.0 |
Common Values
| Value | Count | Frequency (%) |
| 0.0 | 5478 | |
| 1.0 | 469 | 7.4% |
| 0.5 | 374 | 5.9% |
Length
Histogram of lengths of the category
Category Frequency Plot
| Value | Count | Frequency (%) |
| 0.0 | 5478 | |
| 1.0 | 469 | 7.4% |
| 0.5 | 374 | 5.9% |
Most occurring characters
| Value | Count | Frequency (%) |
| 0 | 11799 | |
| . | 6321 | |
| 1 | 469 | 2.5% |
| 5 | 374 | 2.0% |
Most occurring categories
| Value | Count | Frequency (%) |
| Decimal Number | 12642 | |
| Other Punctuation | 6321 |
Most frequent character per category
Decimal Number
| Value | Count | Frequency (%) |
| 0 | 11799 | |
| 1 | 469 | 3.7% |
| 5 | 374 | 3.0% |
Other Punctuation
| Value | Count | Frequency (%) |
| . | 6321 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Common | 18963 |
Most frequent character per script
Common
| Value | Count | Frequency (%) |
| 0 | 11799 | |
| . | 6321 | |
| 1 | 469 | 2.5% |
| 5 | 374 | 2.0% |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 18963 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| 0 | 11799 | |
| . | 6321 | |
| 1 | 469 | 2.5% |
| 5 | 374 | 2.0% |
| Distinct | 5807 |
|---|---|
| Distinct (%) | 91.9% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 0.463119377 |
| Minimum | 0 |
|---|---|
| Maximum | 0.999900463 |
| Zeros | 8 |
| Zeros (%) | 0.1% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 49.5 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 4.62963 × 10-5 |
| Q1 | 0.0479282407 |
| median | 0.4409375 |
| Q3 | 0.8603626543 |
| 95-th percentile | 0.9901342593 |
| Maximum | 0.999900463 |
| Range | 0.999900463 |
| Interquartile range (IQR) | 0.8124344136 |
Descriptive statistics
| Standard deviation | 0.3800972294 |
|---|---|
| Coefficient of variation (CV) | 0.8207327274 |
| Kurtosis | -1.608880478 |
| Mean | 0.463119377 |
| Median Absolute Deviation (MAD) | 0.4035119048 |
| Skewness | 0.08771046524 |
| Sum | 2927.377582 |
| Variance | 0.1444739038 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 2.31481 × 10-5 | 29 | 0.5% |
| 1.15741 × 10-5 | 25 | 0.4% |
| 9.9206 × 10-6 | 19 | 0.3% |
| 8.2672 × 10-6 | 17 | 0.3% |
| 8.10185 × 10-5 | 17 | 0.3% |
| 3.47222 × 10-5 | 16 | 0.3% |
| 4.62963 × 10-5 | 15 | 0.2% |
| 4.9603 × 10-6 | 15 | 0.2% |
| 6.9444 × 10-6 | 13 | 0.2% |
| 1.38889 × 10-5 | 12 | 0.2% |
| Other values (5797) | 6143 |
| Value | Count | Frequency (%) |
| 0 | 8 | |
| 1.6534 × 10-6 | 6 | 0.1% |
| 2.3148 × 10-6 | 4 | 0.1% |
| 3.3069 × 10-6 | 12 | |
| 3.4722 × 10-6 | 2 | < 0.1% |
| 3.858 × 10-6 | 4 | 0.1% |
| 4.6296 × 10-6 | 9 | |
| 4.9603 × 10-6 | 15 | |
| 6.6138 × 10-6 | 7 | |
| 6.9444 × 10-6 | 13 |
| Value | Count | Frequency (%) |
| 0.999900463 | 1 | |
| 0.9998842593 | 1 | |
| 0.9998396164 | 1 | |
| 0.9998240741 | 1 | |
| 0.9998197751 | 1 | |
| 0.9998082011 | 1 | |
| 0.9996990741 | 1 | |
| 0.9996015212 | 1 | |
| 0.9995821759 | 1 | |
| 0.9995601852 | 1 |
| Distinct | 49 |
|---|---|
| Distinct (%) | 0.8% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 0.2316057484 |
| Minimum | 0 |
|---|---|
| Maximum | 0.788235294 |
| Zeros | 2823 |
| Zeros (%) | 44.7% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 49.5 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 0 |
| median | 0.142857143 |
| Q3 | 0.454545455 |
| 95-th percentile | 0.678571429 |
| Maximum | 0.788235294 |
| Range | 0.788235294 |
| Interquartile range (IQR) | 0.454545455 |
Descriptive statistics
| Standard deviation | 0.2552520147 |
|---|---|
| Coefficient of variation (CV) | 1.102097061 |
| Kurtosis | -1.156312208 |
| Mean | 0.2316057484 |
| Median Absolute Deviation (MAD) | 0.142857143 |
| Skewness | 0.5872723251 |
| Sum | 1463.979936 |
| Variance | 0.06515359099 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=49)
| Value | Count | Frequency (%) |
| 0 | 2823 | |
| 0.142857143 | 208 | 3.3% |
| 0.25 | 166 | 2.6% |
| 0.419354839 | 166 | 2.6% |
| 0.28 | 158 | 2.5% |
| 0.052631579 | 149 | 2.4% |
| 0.333333333 | 143 | 2.3% |
| 0.1 | 143 | 2.3% |
| 0.307692308 | 127 | 2.0% |
| 0.217391304 | 122 | 1.9% |
| Other values (39) | 2116 |
| Value | Count | Frequency (%) |
| 0 | 2823 | |
| 0.052631579 | 149 | 2.4% |
| 0.1 | 143 | 2.3% |
| 0.142857143 | 208 | 3.3% |
| 0.181818182 | 112 | 1.8% |
| 0.217391304 | 122 | 1.9% |
| 0.25 | 166 | 2.6% |
| 0.28 | 158 | 2.5% |
| 0.307692308 | 127 | 2.0% |
| 0.333333333 | 143 | 2.3% |
| Value | Count | Frequency (%) |
| 0.788235294 | 68 | |
| 0.772151899 | 16 | 0.3% |
| 0.763157895 | 17 | 0.3% |
| 0.723076923 | 24 | 0.4% |
| 0.714285714 | 34 | |
| 0.709677419 | 49 | |
| 0.704918033 | 13 | 0.2% |
| 0.694915254 | 17 | 0.3% |
| 0.689655172 | 16 | 0.3% |
| 0.684210526 | 57 |
| Distinct | 22 |
|---|---|
| Distinct (%) | 0.3% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 0.4728213414 |
| Minimum | 0 |
|---|---|
| Maximum | 0.999935281 |
| Zeros | 3253 |
| Zeros (%) | 51.5% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 49.5 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 0 |
| median | 0 |
| Q3 | 0.993592814 |
| 95-th percentile | 0.999935281 |
| Maximum | 0.999935281 |
| Range | 0.999935281 |
| Interquartile range (IQR) | 0.993592814 |
Descriptive statistics
| Standard deviation | 0.4899121676 |
|---|---|
| Coefficient of variation (CV) | 1.036146478 |
| Kurtosis | -1.973158938 |
| Mean | 0.4728213414 |
| Median Absolute Deviation (MAD) | 0 |
| Skewness | 0.08641184491 |
| Sum | 2988.703699 |
| Variance | 0.240013932 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=22)
| Value | Count | Frequency (%) |
| 0 | 3253 | |
| 0.993592814 | 1921 | |
| 0.999935281 | 345 | 5.5% |
| 0.993528095 | 322 | 5.1% |
| 0.935345672 | 89 | 1.4% |
| 0.676404764 | 48 | 0.8% |
| 0.99935281 | 43 | 0.7% |
| 0.998899776 | 36 | 0.6% |
| 0.98712091 | 34 | 0.5% |
| 0.825258573 | 29 | 0.5% |
| Other values (12) | 201 | 3.2% |
| Value | Count | Frequency (%) |
| 0 | 3253 | |
| 0.514607147 | 22 | 0.3% |
| 0.54696667 | 15 | 0.2% |
| 0.553050261 | 15 | 0.2% |
| 0.676404764 | 48 | 0.8% |
| 0.676469483 | 24 | 0.4% |
| 0.805842859 | 5 | 0.1% |
| 0.825258573 | 29 | 0.5% |
| 0.929585677 | 3 | < 0.1% |
| 0.935280953 | 26 | 0.4% |
| Value | Count | Frequency (%) |
| 0.999935281 | 345 | 5.5% |
| 0.99935281 | 43 | 0.7% |
| 0.998899776 | 36 | 0.6% |
| 0.995146071 | 13 | 0.2% |
| 0.993592814 | 1921 | |
| 0.993528095 | 322 | 5.1% |
| 0.98712091 | 34 | 0.5% |
| 0.967705195 | 11 | 0.2% |
| 0.967640476 | 29 | 0.5% |
| 0.961233291 | 23 | 0.4% |
| Distinct | 5690 |
|---|---|
| Distinct (%) | 90.0% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 0.4306825513 |
| Minimum | 0 |
|---|---|
| Maximum | 0.999900463 |
| Zeros | 15 |
| Zeros (%) | 0.2% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 49.5 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 2.31481 × 10-5 |
| Q1 | 0.0266203704 |
| median | 0.3601041667 |
| Q3 | 0.8267607564 |
| 95-th percentile | 0.9886998457 |
| Maximum | 0.999900463 |
| Range | 0.999900463 |
| Interquartile range (IQR) | 0.800140386 |
Descriptive statistics
| Standard deviation | 0.3807851666 |
|---|---|
| Coefficient of variation (CV) | 0.8841434729 |
| Kurtosis | -1.582106675 |
| Mean | 0.4306825513 |
| Median Absolute Deviation (MAD) | 0.354728836 |
| Skewness | 0.2204189586 |
| Sum | 2722.344407 |
| Variance | 0.1449973431 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 2.31481 × 10-5 | 42 | 0.7% |
| 1.15741 × 10-5 | 30 | 0.5% |
| 9.9206 × 10-6 | 25 | 0.4% |
| 3.47222 × 10-5 | 25 | 0.4% |
| 4.62963 × 10-5 | 21 | 0.3% |
| 4.9603 × 10-6 | 20 | 0.3% |
| 3.3069 × 10-6 | 18 | 0.3% |
| 8.2672 × 10-6 | 18 | 0.3% |
| 6.94444 × 10-5 | 15 | 0.2% |
| 6.9444 × 10-6 | 15 | 0.2% |
| Other values (5680) | 6092 |
| Value | Count | Frequency (%) |
| 0 | 15 | |
| 1.6534 × 10-6 | 12 | |
| 2.3148 × 10-6 | 6 | 0.1% |
| 3.3069 × 10-6 | 18 | |
| 3.4722 × 10-6 | 2 | < 0.1% |
| 3.858 × 10-6 | 9 | |
| 4.6296 × 10-6 | 10 | |
| 4.9603 × 10-6 | 20 | |
| 6.6138 × 10-6 | 14 | |
| 6.9444 × 10-6 | 15 |
| Value | Count | Frequency (%) |
| 0.999900463 | 1 | |
| 0.9998688272 | 1 | |
| 0.9998396164 | 1 | |
| 0.9997962963 | 1 | |
| 0.9996990741 | 1 | |
| 0.9996015212 | 1 | |
| 0.9995601852 | 1 | |
| 0.9995300926 | 1 | |
| 0.9994907407 | 1 | |
| 0.9994791667 | 1 |
| Distinct | 72 |
|---|---|
| Distinct (%) | 1.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 0.3677311985 |
| Minimum | 0 |
|---|---|
| Maximum | 1 |
| Zeros | 3640 |
| Zeros (%) | 57.6% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 49.5 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 0 |
| median | 0 |
| Q3 | 0.851851852 |
| 95-th percentile | 1 |
| Maximum | 1 |
| Range | 1 |
| Interquartile range (IQR) | 0.851851852 |
Descriptive statistics
| Standard deviation | 0.4365734667 |
|---|---|
| Coefficient of variation (CV) | 1.187208125 |
| Kurtosis | -1.737443619 |
| Mean | 0.3677311985 |
| Median Absolute Deviation (MAD) | 0 |
| Skewness | 0.4041892864 |
| Sum | 2324.428906 |
| Variance | 0.1905963918 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 0 | 3640 | |
| 1 | 826 | 13.1% |
| 0.75 | 143 | 2.3% |
| 0.833333333 | 110 | 1.7% |
| 0.666666667 | 107 | 1.7% |
| 0.8 | 78 | 1.2% |
| 0.857142857 | 74 | 1.2% |
| 0.916666667 | 69 | 1.1% |
| 0.5 | 62 | 1.0% |
| 0.866666667 | 58 | 0.9% |
| Other values (62) | 1154 | 18.3% |
| Value | Count | Frequency (%) |
| 0 | 3640 | |
| 0.333333333 | 12 | 0.2% |
| 0.4 | 3 | < 0.1% |
| 0.5 | 62 | 1.0% |
| 0.555555556 | 9 | 0.1% |
| 0.571428571 | 12 | 0.2% |
| 0.6 | 21 | 0.3% |
| 0.611111111 | 11 | 0.2% |
| 0.615384615 | 3 | < 0.1% |
| 0.625 | 2 | < 0.1% |
| Value | Count | Frequency (%) |
| 1 | 826 | |
| 0.976744186 | 8 | 0.1% |
| 0.954545455 | 12 | 0.2% |
| 0.95 | 5 | 0.1% |
| 0.947368421 | 11 | 0.2% |
| 0.944444444 | 11 | 0.2% |
| 0.941176471 | 9 | 0.1% |
| 0.9375 | 6 | 0.1% |
| 0.935483871 | 10 | 0.2% |
| 0.933333333 | 40 | 0.6% |
Auction_Duration
Categorical
| Distinct | 5 |
|---|---|
| Distinct (%) | 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 49.5 KiB |
| 7 | |
|---|---|
| 3 | |
| 1 | |
| 5 | |
| 10 | 137 |
Length
| Max length | 2 |
|---|---|
| Median length | 1 |
| Mean length | 1.021673786 |
| Min length | 1 |
Characters and Unicode
| Total characters | 6458 |
|---|---|
| Distinct characters | 5 |
| Distinct categories | 1 ? |
| Distinct scripts | 1 ? |
| Distinct blocks | 1 ? |
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | 5 |
|---|---|
| 2nd row | 5 |
| 3rd row | 5 |
| 4th row | 5 |
| 5th row | 7 |
Common Values
| Value | Count | Frequency (%) |
| 7 | 2427 | |
| 3 | 1408 | |
| 1 | 1289 | |
| 5 | 1060 | |
| 10 | 137 | 2.2% |
Length
Histogram of lengths of the category
Category Frequency Plot
| Value | Count | Frequency (%) |
| 7 | 2427 | |
| 3 | 1408 | |
| 1 | 1289 | |
| 5 | 1060 | |
| 10 | 137 | 2.2% |
Most occurring characters
| Value | Count | Frequency (%) |
| 7 | 2427 | |
| 1 | 1426 | |
| 3 | 1408 | |
| 5 | 1060 | |
| 0 | 137 | 2.1% |
Most occurring categories
| Value | Count | Frequency (%) |
| Decimal Number | 6458 |
Most frequent character per category
Decimal Number
| Value | Count | Frequency (%) |
| 7 | 2427 | |
| 1 | 1426 | |
| 3 | 1408 | |
| 5 | 1060 | |
| 0 | 137 | 2.1% |
Most occurring scripts
| Value | Count | Frequency (%) |
| Common | 6458 |
Most frequent character per script
Common
| Value | Count | Frequency (%) |
| 7 | 2427 | |
| 1 | 1426 | |
| 3 | 1408 | |
| 5 | 1060 | |
| 0 | 137 | 2.1% |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 6458 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| 7 | 2427 | |
| 1 | 1426 | |
| 3 | 1408 | |
| 5 | 1060 | |
| 0 | 137 | 2.1% |
| Distinct | 2 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 49.5 KiB |
| 0 | |
|---|---|
| 1 |
Length
| Max length | 1 |
|---|---|
| Median length | 1 |
| Mean length | 1 |
| Min length | 1 |
Characters and Unicode
| Total characters | 6321 |
|---|---|
| Distinct characters | 2 |
| Distinct categories | 1 ? |
| Distinct scripts | 1 ? |
| Distinct blocks | 1 ? |
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | 0 |
|---|---|
| 2nd row | 0 |
| 3rd row | 0 |
| 4th row | 0 |
| 5th row | 0 |
Common Values
| Value | Count | Frequency (%) |
| 0 | 5646 | |
| 1 | 675 | 10.7% |
Length
Histogram of lengths of the category
Category Frequency Plot
| Value | Count | Frequency (%) |
| 0 | 5646 | |
| 1 | 675 | 10.7% |
Most occurring characters
| Value | Count | Frequency (%) |
| 0 | 5646 | |
| 1 | 675 | 10.7% |
Most occurring categories
| Value | Count | Frequency (%) |
| Decimal Number | 6321 |
Most frequent character per category
Decimal Number
| Value | Count | Frequency (%) |
| 0 | 5646 | |
| 1 | 675 | 10.7% |
Most occurring scripts
| Value | Count | Frequency (%) |
| Common | 6321 |
Most frequent character per script
Common
| Value | Count | Frequency (%) |
| 0 | 5646 | |
| 1 | 675 | 10.7% |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 6321 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| 0 | 5646 | |
| 1 | 675 | 10.7% |
Auto
The auto setting is an easily interpretable pairwise column metric of the following mapping: vartype-vartype : method, categorical-categorical : Cramer's V, numerical-categorical : Cramer's V (using a discretized numerical column), numerical-numerical : Spearman's ρ. This configuration uses the best suitable for each pair of columns.Spearman's ρ
The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
Pearson's r
The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
Kendall's τ
Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
Cramér's V (φc)
Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.Phik (φk)
Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here. A simple visualization of nullity by column.
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
First rows
| Record_ID | Auction_ID | Bidder_ID | Bidder_Tendency | Bidding_Ratio | Successive_Outbidding | Last_Bidding | Auction_Bids | Starting_Price_Average | Early_Bidding | Winning_Ratio | Auction_Duration | Class | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 1 | 732 | _***i | 0.200000 | 0.400000 | 0.0 | 0.000028 | 0.000000 | 0.993593 | 0.000028 | 0.666667 | 5 | 0 |
| 1 | 2 | 732 | g***r | 0.024390 | 0.200000 | 0.0 | 0.013123 | 0.000000 | 0.993593 | 0.013123 | 0.944444 | 5 | 0 |
| 2 | 3 | 732 | t***p | 0.142857 | 0.200000 | 0.0 | 0.003042 | 0.000000 | 0.993593 | 0.003042 | 1.000000 | 5 | 0 |
| 3 | 4 | 732 | 7***n | 0.100000 | 0.200000 | 0.0 | 0.097477 | 0.000000 | 0.993593 | 0.097477 | 1.000000 | 5 | 0 |
| 4 | 5 | 900 | z***z | 0.051282 | 0.222222 | 0.0 | 0.001318 | 0.000000 | 0.000000 | 0.001242 | 0.500000 | 7 | 0 |
| 5 | 8 | 900 | i***e | 0.038462 | 0.111111 | 0.0 | 0.016844 | 0.000000 | 0.000000 | 0.016844 | 0.800000 | 7 | 0 |
| 6 | 10 | 900 | m***p | 0.400000 | 0.222222 | 0.0 | 0.006781 | 0.000000 | 0.000000 | 0.006774 | 0.750000 | 7 | 0 |
| 7 | 12 | 900 | k***a | 0.137931 | 0.444444 | 1.0 | 0.768044 | 0.000000 | 0.000000 | 0.016311 | 1.000000 | 7 | 1 |
| 8 | 13 | 2370 | g***r | 0.121951 | 0.185185 | 1.0 | 0.035021 | 0.333333 | 0.993528 | 0.023963 | 0.944444 | 7 | 1 |
| 9 | 27 | 600 | e***t | 0.155172 | 0.346154 | 0.5 | 0.570994 | 0.307692 | 0.993593 | 0.413788 | 0.611111 | 7 | 1 |
Last rows
| Record_ID | Auction_ID | Bidder_ID | Bidder_Tendency | Bidding_Ratio | Successive_Outbidding | Last_Bidding | Auction_Bids | Starting_Price_Average | Early_Bidding | Winning_Ratio | Auction_Duration | Class | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 6311 | 15115 | 760 | s***l | 0.028571 | 0.040000 | 0.0 | 0.057203 | 0.280000 | 0.993593 | 0.057203 | 0.000000 | 3 | 0 |
| 6312 | 15121 | 927 | v***- | 0.181818 | 0.095238 | 0.0 | 0.955694 | 0.142857 | 0.000000 | 0.955313 | 0.000000 | 1 | 0 |
| 6313 | 15123 | 760 | c***e | 0.111111 | 0.040000 | 0.0 | 0.569726 | 0.280000 | 0.993593 | 0.569726 | 0.000000 | 3 | 0 |
| 6314 | 15124 | 760 | q***0 | 0.625000 | 0.200000 | 0.5 | 0.473063 | 0.280000 | 0.993593 | 0.431902 | 0.500000 | 3 | 0 |
| 6315 | 15128 | 760 | i***l | 0.022222 | 0.040000 | 0.0 | 0.629606 | 0.280000 | 0.993593 | 0.629606 | 0.000000 | 3 | 0 |
| 6316 | 15129 | 760 | l***t | 0.333333 | 0.160000 | 1.0 | 0.738557 | 0.280000 | 0.993593 | 0.686358 | 0.888889 | 3 | 1 |
| 6317 | 15137 | 2481 | s***s | 0.030612 | 0.130435 | 0.0 | 0.005754 | 0.217391 | 0.993593 | 0.000010 | 0.878788 | 7 | 0 |
| 6318 | 15138 | 2481 | h***t | 0.055556 | 0.043478 | 0.0 | 0.015663 | 0.217391 | 0.993593 | 0.015663 | 0.000000 | 7 | 0 |
| 6319 | 15139 | 2481 | d***d | 0.076923 | 0.086957 | 0.0 | 0.068694 | 0.217391 | 0.993593 | 0.000415 | 0.000000 | 7 | 0 |
| 6320 | 15144 | 2481 | a***l | 0.016393 | 0.043478 | 0.0 | 0.340351 | 0.217391 | 0.993593 | 0.340351 | 0.000000 | 7 | 0 |